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ABSTRACT 


A suboptimal estimation scheme similar to the Kalman filter 
is described which makes use of scalar weighting factors instead 
of matrix factors. It is shown that the accuracy degradation of 
the suboptimal estimator is not too great for most cases of 
practical interest. Moreover, it readily lends itself to physi- 
cal interpretation. 
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SUMMARY 

A suboptimal scheme similar to the Kalman filter is investi- 
gated here. The basic idea underlying the new device is that 
scalar weighting factors instead of matrix factors are used in 
constructing the estimate. It is quantitatively shown that the 
degradation in accuracy of the new estimator, in most cases, is 
not too great (typically around a factor of two) . Aside from 
the possibility of simplifying the estimation procedure, the new 
device has the advantage of readily lending itself to physical 
interpretation. This, in turn, is shown to be useful in under- 
standing the associated Kalman filter and allowing some signifi- 
cant a priori testing on it. 

The treatment here is confined to the case where the dimen- 
sionality of the measurement vector is the same as that of the 
state vector, but it is suggested how the treatment could be 
extended to the general case. 


I. - INTRODUCTION 

The objective here is to investigate a certain type of esti- 
mator which is similar to the Kalman filter but is suboptimal in 
performance. The basic idea underlying the new device is that 
scalar weighting factors are used instead of matrix factors in 
construction of the estimate. It is shown that the degradation 
in accuracy of the suboptimal filter is, in most cases, not too 
great (typically being around a factor of two) . An obvious 
advantage of the new filter is that, at least in certain cases, 
it can offer a significant simplification in the estimation 
procedure . 

Another advantage is that the new filter readily lends itself 
to physical interpretation and this, in turn, may prove to be of 
value in understanding the associated Kalman filter. In parti- 
cular, it will be shown that by calculating a certain factor, it 
should be possible to obtain a rough idea, a priori, of the abso- 
lute effectiveness of the Kalman estimator. The effectiveness 
criterion developed here is compared with the designation, "opti- 
mum" , which heretofore has been the prevalent concept in estimator 
effectiveness. The a priori testing could be of practical value 
in constructing a computer program that would make the most effec- 
tive use of the available facilities. 


II. 


PRELIMINARY MATHEMATICAL FORMULATION 


This section is to be devoted to deriving those mathematical 
formulas which form the basis of the investigation to be described. 
The treatment here is to be limited to the case of discrete sam- 
pling intervals where the measurement data (in the form of the 
measurement vectors, z k _^) are continuously supplied to the 
estimator at the discrete times, t k _^, with a finite time inter- 
vai, At k _ ± (= t k _. - t^^) , 

The Kalman-Bucy equations for this case are well known 
(refs. 1, 2, and 3) . The canonical equations take the form:* 


x (k + 1) = 0(k + l,k)x(k) + r k+1 k u (k) (la) 

z (k) = H k x(k) + v (k) (lb) 


The estimator equations are: 


x(k + 1 | k) = $ (k + l,k)x(k | k) (2a) 


x(k | k) = (I - K k H k )x(k | k - 1) + 

K k z (k) 

(2b) 

where 



K k = [P(k| k - l)H k ][H k P(k | k - l)H k + rJ 1 

(2c) 

The equation for the variance matrix, P (k + 

1 | k) is : 


P (k + 1 | k) = $ (k + 1 | k) P (k | k)$ T (k+l,k) 

+ r k+l,k Q k r k+l,k 

(3a) 

P(k | k) = W k P(k | k - 1) 


(3b) 


*In regard to notation, the symbols used here are standard (such 
as employed in references 1 and 3) and are more or less self- 
explanatory in equation sets (1) through (3). In general, an 
upper case Greek or Latin letter will represent an (nxm) matrix. 
A lower case Latin letter will represent a (lxn) vector. A 
lower case Greek letter will represent a scalar. 
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where 


w k 5 (I - W 

It is understood that the matrices R k and Q k are defined by 


the following relations: 



cov [v (k) ,v(i) ] 

* Vik 

(3c) 

cov[u(k) ,u(i) ] 

- Vik 

(3d) 

cov[u(k) , v(i) ] 

= 0 

( 3e) 


By making repeated use of the estimator equation set (2) for 
the time instants t k _^ (where i = 0,1, 2,3, ...,<»), it is possible 
to obtain an expression of the estimate which has the form: 


x (k | k) 


£ ft (k ,k - i) 
i=0 




(k 




(4) 


where 


ft(k,k-i) = W k $(k,k- l)W k _ 1 $(k- l,k -2)... 

... w k+1 _i $ (k + 1 “ i ,k - i) 
for i > 1 


ft (k,k) = I 


A 

It is, of course, apparent that the expression for x given 
in Eq. (4) is strictly valid only if the inverse H k _^ exists. 

This assumption will be made here for the treatment which follows. 
Physically, this means that a complete zero bias estimate, 
x Q (k | k) , can be constructed only from the contemporary data, 
z(k) and the dimensions of z(k) must be the same as x(k) . 

Although in the interest of simplicity only the case just de- 
scribed is to be considered explicitly, it seems possible at 
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this time that treatment developed here can be extended to the 
general case . * 

In extending the i summation of Eq. (4) to infinity, it has 
been assumed that the estimator has reached a "steady state" 
where the initial measurement values, z(o), no longer influence 
the estimate, x. 

In a manner similar to that used in Eq. (4) , the repeated 
application of the equation set (3) can bring the variance matrix 
to the form: 


P(k | k) = 53 ft(k,k - i) (k,k - i) 

+ i?o n<k,k ' i) [ z ' w *-i][ E *-i] 

ft T (k,k - i) 


-i 


3 


I - W 


where 


k-i 


r k-i Q k-i r k 


(5) 


r k-i 5 ®~ 1<k + 1 - 1 ' k - i)r k+i-i,k-i 

E k-i E - [“k-A-A-i] _1 


In studying Eq. (4) for the optimum estimate, x, it is 
possible to regard the matrix factors, as weighting factors. 

It becomes very suggestive to replace these matrix factors with 
scalar factors, and construct a new kind of estimate, x^. 

It can be shown that a zero bias estimate can indeed be 
constructed to have the form: 


x A (k | k) = 53^Y]^i$(k,k - i) |h^^z (k - i)j (6) 


*For details on how this might be done see Appendix C. 

j- 
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However, it is necessary that the scalar factors, 
satisfy the following relation: 


r k,i 


/ 


oo 


Ey 

i=0 


k,i 


1 


( 7 ) 


Moreover, the covariance matrix. 


P_ (k | k) 

A 


cov j|j 


x(k) - x (k | k) , x (k) - x (k | k) 
A A 




corresponding to x is given by: 


J A (k I k) = e k,i $ (k,k _ [ D k-i] $T (k,k ” ^ 

OO 

+ (k,k - i) ^E k _J $ T (k,k - i) 


where 


(8a) 


^k , i Y k , j 

3=i ,J 


(8b) 


The scalar factors Bj^i and Yj^i can be made to take a more 
suggestive form if they are defined' in terms of a new scalar 
factor, / such that: 

e k,i = a k a k-l a k-2' * * * ,a k+l-i 

for i > 1 (9a) 


B 


k,o 


e 1 


Thus, it can be shown that if: 


Y k,i e k,i 3 k,i-l " e k,i (1 a k-i ) 


(9b) 


5 



both Eqs . (7) and (8b) A are satisfied. Moreover, Eqs. (4) and 

(6) for the estimates x and x A have a direct correspondence 
since Eq. (6) results from Eq. (4) if W k _^ is replaced by 
[This is also true for the variance Eqs. (3a) and (5)]. 

At this point, a few speculative remarks about the scalar 
weighting factor, might be of interest. 

Intuitively, it seemg reasonable to assume that when the 
a's are adjusted to make x A an optimum (i.e., when tr[P A (k | k) ] 
is a minimum) , that will bear some close relationship to 

Wj._^ such as having a value* near 1/n tr(W] < ._£), the mean charac- 
teristic root of W k _^. This assumption will later be shown to 
be reasonably valid at least for the case of greatest practical 
interest. It follows from this that should always have a 

value between zero and unity. This result is also a consequence 
of the consideration that the infinite series expression for x 
in Eq. (4) must converge so that a "steady state" is reached 
(i.e. , a condition is reached where the estimate is no longer 
influenced by the initial z (k - i) data) . 


III. - QUANTITATIVE COMPARISON OF ESTIMATORS 

In this section, the suboptimal filter being studied here 
will be compared with its corresponding Kalman filter by evalua- 
ting the two performance indices tr[P(k | k)] and tr[P^(k j k) ] . 
This will actually be done only for the two extreme cases of high 
"predictability" and low "predictability". The general case is 
very difficult to treat and moreover, the high "predictability" 
case is usually the one of greatest practical interest. 

Low Predictability Case 

It can be shown** that for the case of low "predictability" 
(or more explicitly where || D k_i|| >> || E k-i|| f° r i)*** that: 


w k = fcvr 1 ] Kl'H 1 - [v h xv] '"} 


♦Where tr(A) □ Trace A and n is the dimensionality of the 
square matrix. 

**See Appendix A for details of the calculation. 

***Where ||a|| = [tr (A T A) ] X / 2 . 
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where 


p k H [ H kVk _1 + r k,k-l Q k-i r k,k-l 


From Eq. (10) , and its approximate relationship to W k stated in 
Section II , it would follow that for this case a k _j_ « 1 

If Eq. (10) is used in evaluating P (k | k) from Eq. (3b) 
along with the approximation for P (k | k - 1) developed in 
Eq. (5a) in Appendix A, the result is: 

p (k | k) 2 [E fc ]{l - [B-X]- 1 } (ID 

In studying Eq. (11) , it can be seen that the Kalman vari- 
ance matrix, P (k | k) , (which is also the system error matrix) is 
determined almost entirely by the measurement error matrix, R k , 
corresponding to the contemporary measurement vector, z (k) , 

(since the second matrix in the bracket is small compared to I) . 
This result can be understood on purely physical grounds since 
as R k grows smaller (and ||E k || becomes small compared to ||D k ||), 
the optimum estimate for the present instant of time, t k , should 
be based more on z(k) and less on any of the z (k - i) (i 1) . 

The uncertainty in "updating" will start to produce errors large 
compared to E k and when this happens, the z(k - i) (i >_ 1) data 
can have little value in determining the optimum estimate and 
therefore must be excluded. Thus, for the case of low "predict- 
ability" , the system error should be nearly the same as the 
measurement error and the Kalman filter provides little improve- 
ment over an estimator using only the current measurement data, 
z(k). Since, as seen above a k _^ « 1 for this case, it can readily 
be shown that P^(k | k) = E k and the same statement can be made 
for the suboptimal filter. It is then apparent that the two 
performance indices tr[P(k | k) ] and tr[P A (k | k) ] are nearly equal 
for low "predictability" cases. 

High-Predictability Case 

The case of high "predictability" (i.e., where j| D k — ill« || E k-i || ) 
to be considered in this section is much more difficult to treat 
generally. Some initial assumptions are now to be made which 
should not seriously reduce the generality of the treatment but 


7 



will greatly simplify the calculations. The assumptions are 
that for i = 0,1,2,...,°°, the following relations are true: 


t, . - t, . , = At ;D, . = D ; E. . = E ; 
k-i k-i-1 k-i k-i 


FAt 


$ (k-i, k-i-1) = e ; W^._^ = I - AW ; = a 


The matrices, D, E, F, and AW are constants. All of the fore- 
going assumptions are reasonable if the estimator has reached 
its "steady state", where the various parameters show little 
variation in a time interval, N e At, [where N e is the "effective" 
number of contributions to the estimate from the various z (k - i)]. 

Using the foregoing assumptions in Egs. (5) and (8a) for 
the two variance matrices, the result will be: 


P(k | k) = £ £(I-AW)e FAt J [d] [e F At (I- AW T )] 

+ j(I “ AW)e FAt j j(AW)E(AW T )j je F At (I - AW T )J 

( 12 ) 

00 r t n 

P A < k I k) = £ , (o) 21 [e iFAt De iF At J 

+ t (a) 21 a-a) 2 re iF4t E e lpTAt l (13) 

i=0 L J 


The following relationship* will prove to be of value in the 
developments of this section: 


(I- W k ) = P(k | k) e" 1 


(14) 


*This relationship can readily be derived from Eg. (VIII) of 
ref. 3. 
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As will be seen later, as D^q; P (k | k)-*0. Thus, it follows 
that as [||D||/||E||]-vO; AW-*-0 and moreover (1 - a)+0. 

It can be shown that for the "high-predictability" case, the 
infinite sums of Eqs . (12) and (13) can be approximated by infi- 

nite integrals with the resulting errors being only of the order 
of || AW || and (1 - a) . The actual integral for P (k | k) would be 
of the form: 


P (k | k) = J dy e (AF_AW)y |D+ (AW) E (AW T )J e (AF AW )y 


where 


AF = FAt 


To obtain Eq. (15) from Eq. (12) , the D integration lower 
limit was extended from 1 to 0 , and the quantity (I - AW) was 
approximated by e”AW. Both of these operations would introduce 
errors only of the order of II AW II. Similarly, the integral form 
for P (k I k) can be shown to be: 

P (k | k) = f dy e (AF-Aa)y [D + (Aa) 2 E]e (AF “ Aa)y (16) 

A *'0 


where 


Aa = -log a = (1 - a) 


Again, the D integration lower limit was extended from 1 to 
0, and error here is of the order of Aa. The evaluation of inte- 
grals of the form given in Eqs. (15) and (16) has been treated 
extensively*, and the results can be expressed in the form of 
the following sets of algebraic matrix equations: 


( AW - AF) P + P(AW T -AF T ) = D + (AW)E(AW T ) (17) 


*See, for example, "Introduction to Matrix Analysis" by R. Bellman, 
page 231; McGraw-Hill Book Company, Inc., 1960. 
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(Aa - AF) P + P (Aa-AF T ) =D + (Aa) 2 E (18) 

A A 


As indicated by Eqs. (17) and (18) , the operation of main 
interest here is obtaining explicit forms of tr(P) and tr(P^) 
and comparing them. To do this, it is expedient to calculate 
the quantity, AP(5 P A -P) from these equations. Thus, if Eq. (17) 
is subtracted from Eq. (18) and using the relation of Eq. (14) 

[p = (A'W)E], it is possible to obtain the following relation: 

(Aa - AF) AP + AP (Aa - AF T ) + 1/2 (Aa - AW) (P - AaE) 

+ 1/2 (P - AaE) (Aa - AW T ) = 0 (19) 

Since P = E(AW T ) , it follows from Eq. (19) that: 

tr £(Aa - AF) ApJ = l/2tr [e (AW T - Aa) 2 J (20) 

Eq. (20) can be changed to a more suggestive form by again using 
the relation of Eq. (14) so that it becomes: 


tr 


£(I - B) ApJ =^2Aa) [ tr(PE lp) " (2Aa)tr(P) + (Aa) 2 tr(E)] 


( 21 ) 


where 

B 5 (2fe) (AF+AFT) 


The l.h.s. of Eq. (21) results from the fact that AP is a sym- 
metric matrix. 

As can be seen from Eq. (21) , tr(AP) is a function of the 
scalar, Aa; and it will assume a minimum value for a certain 
choice of Aa. In general, the variation of the matrix factor, 

(I - B) , will not greatly affect (Aa) m , the value of Aa for which 
tr(AP) is a minimum; so for simplicity, (Aa) m is to be determined 
by minimizing the r.h.s. of Eq. (21). This can be done as a 
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straightforward differentiation problem, but it is more suggestive 
to do it by making the following substitution: 


Aa = 6 q Vn (22) 

where 

& 0 = jtr(PE -1 P)/tr (E) J 1 / 2 

When Eq. (22) is substituted in Eq. (21) and both sides are 
divided by tr(P), the result is: 

B) AP ] = [l/2( ^ + 1/ >/tT) VK ~ 1] (23) 

tr (P) 


where 




tr (E ) • tr (PE -’-P) 
tr 2 (P) 


It is evident that the r.h.s. of Eq. (23) (which represents 
a rough measure of the percentage deviation of P and P^) is a 
minimum when the scalar parameter, g = 1, so that (Aa) m = 6q- 
Moreover, the minimum is fairly broad since g can vary from 1/2 
to 2 and the r.h.s. of Eq. (23) will stay within six percent of 
its minimum value . 

It is to be noted that both symmetric matrices AP and 
(I - B) must be positive definite*. This is true for AP because 
P^ must always exceed P, the possible minimum. It is true for 
(1 - B) , since otherwise the integral in Eq. (16) will become 


*It should be noted at this point that the tentative assumption 
made in Section II about the magnitude of a^-i can now be veri- 
fied. From Eq. (22) it follows that (1 - a) = [tr ( AW T EAW) /tr (E)]^ 
whereas at the beginning of Section III it was, in effect, 
assumed that (1 - a) = (l/n)tr(AW). It can be shown that, in 
general, the two values will roughly agree (to within a factor 
of two) . 
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infinite. It can therefore be shown that*: 


A 0 tr(AP) > tr [ (I - B) AP] > A tr(AP) 


(24a) 


where 


V 


m 


> 0 


A m is the maximum characteristic root of the matrix (I - B) 
and A m is the minimum characteristic root. Hence by evaluating 
Am and A m , the range of variation of tr (AP)/tr(P) can be deter- 
mined from Eqs . (23) and 24a). However, since in what follows 

only a rough estimate of tr(AP)/t(P) is desired, Eq. (24a) sug- 
gests that the effects of the matrix factor, (I - B) , can be 
approximated with sufficient accuracy (at least in most cases) 
by means of the following relation: 

tr[ (I - B) AP] = A D tr(AP) (24b) 

B 


where 

n 

A_ = (1/n) tr (I - B) = (1/n) V A. 
B i=l 1 


The A^ are the characteristic roots of (I -B). 

It is apparent in studying Eq. (23) that the parameter, 
is of central importance in determining the value of tr(AP). It 
is first to be noted that E, > 1 , since otherwise tr(AP) might be 
negative. Moreover, (assuming that n = 1 so the factor in front 
of vT is unity), it is seen that the quantity [tr (AP) /tr (P) ] 
which is the measure of the accuracy degradation has a value 
roughly equal to [(V*£-1)/Ab]* The value of E, in any particular 
case, of course, depends on the nature of the estimation problem 
(i.e., on the exact form of the matrices D, E, and F) ; and it 
can assume values ranging from those near unity to others much 
greater than unity. It is felt that it might be of value to 
estimate the parameter, £, for a couple of selected examples 
which should illustrate the behavior of this parameter. This is 
to be undertaken in what follows, but first it is to be noted 
that in order to evaluate E , , it is necessary to calculate P. 


*See Appendix B for details of the derivation. 
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The operation can be accomplished by solving the system of alge- 
braic equations derived from Eqs. (14) and (17) and represented 
in matrix form as : 


D + P(AF T ) + (AF)P - PE _1 P = 0 


(25) 


In obtaining a unique form of P from Eq. (25) , the supplementary 
condition that as D-»-0, P-*-0 is to be used. 

The matrix Eq. (25) is seen to closely resemble Eq. Set (3) , 
the Kalman variance equations [it becomes identical if 0 on the 
r.h.s. is replaced by At (dP/dt) ] . The main difference, of course, 
is that it is a set of algebraic rather than differential equa- 
tions. The details of obtaining a practical solution to Eq. (25) 
is to be left for the following section. In this section, it is 
simply assumed that P can be made available from Eq. (25) . 

Having p^j (the components of P) , the parameter £ can be 
calculated and J can be shown to take the explicit form: 



where i, j = 1, 2, 3,...,n. 

In obtaining Eq. (26) , the simplifying assumption has been 
made that the matrix E is diagonal. This assumption is actually 
not too restrictive because it covers many, if not most, cases 
of practical interest. It is to be noted that P xj = c ijPiiPjj' 
where -1 -c^j <1 which follows from the definition of P(k | k) J (=P). 

The illustrative case to be considered now is where c^-i = 1 
and all the components of E, , are nearly equal. It is readily 
seen from Eq. (26) that £ = n here, and the degradation factor 
is [ ( Vn - 1)/Xg]. Thus, unless the matrix dimensionality is 
very large, it is seen that the accuracy degradation for the sub- 
optimal estimator need not be too great for this case. 

The next case to be considered is where one particular com- 
ponent of E, (say en) , is much larger (at least a factor of 2n) 
than the others and, moreover, this will cause its corresponding 
P component, p^, to be much larger (by at least a factor of 
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2n[ e n/ e ii] 1//2 ) than the others. For these conditions, it can 
be seen that £ = 1 and the accuracy degradation factor is much 
smaller than unity. For this case, the suboptimal estimator is 
very nearly as good as the Kalman filter. 

The two examples treated above were chosen because the 
first case represents an easily realizable situation where the 
suboptimal estimator would be performing at nearly its worst 
compared to a Kalman filter, and the second case represents a 
situation where the suboptimal estimator is performing nearly 
the best that is possible. In general, the situation might be 
roughly summarized by saying that the accuracy degradation factor 
will have a value of around 2 . This is to say that the agreement 
between P A (k | k) and P (k j k) is usually fairly close. 


IV. - ESTIMATOR EFFECTIVENESS 

One of the benefits to be derived from the viewpoint devel- 
oped in the previous section is that it allows an absolute cri- 
terion in judging the effectiveness of an estimator. Heretofore, 
the designation, "optimum", has been prevalent in the question 
of estimator effectiveness. As will be seen, this is actually 
a vague criterion which is much more meaningful to a mathematician 
than to a physicist or engineer. It is proposed here that the 
standard against which any estimator should be compared should 
be the estimator which uses only contemporary data [where the 
estimate is H^^-z(K)3. In this light, the practical limitations 
of the designation, "optimum", become apparent when it is noticed 
that it is perfectly possible that one estimator be only about 
1.1 times as accurate as its corresponding contemporary data 
estimator, while another estimator be 100 times as accurate and 
both could be optimum estimators. 

It is also proposed here that the parameter which is the 
measure of estimator effectiveness be N e ; the "effective" number 
of contributions to the optimum estimate. From the central limit 
theorem of probability, it would follow that the error in the 
optimum estimate is roughly equal to the error in any one contri- 
bution divided by N e . This would seem to fit in with the proposal 
to use the contemporary data estimator as a standard, since N e 
can therefore be defined by the relation tr[P(k| k) ] = 1/N e tr (E k ) . 

It would then follow from Eqs. (22) and (23) that 


N e - HHf * (^ /S o) < 27 > 
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Thus, it is seen that N e can indeed be the measure of the 
effectiveness of an estimator, since tr(E] c )is the measure of the 
overall error of a contemporary data estimator just as tr(P) is 
the overall measure of its corresponding Kalman filter. The 
case of the Kalman filter is given by Eq. (27) ; and from this it 
would follow that the error reduction can be very large, since 
(as will be seen), Sq can be made quite small. The actual cal- 
culation of N e requires the knowledge of the covariance matrix, 

P[= P (k | k) ] . The evaluation of P is to be the main concern of 
the following section, and as will be seen, involves an iteration 
solution to the matrix algebraic equation, Eq. (25) . Although 
there would be only one evaluation, the actual performance could 
be quite involved and generally requires the services of a digi- 
tal computer. If only a rough a-priori evaluation of N e be 
sufficient, the calculation could become quite simple. Drawing 
from the results of the following section, one approach is to 
approximate P by P of Eq. (34) (in Section V) and use this in 
the definition of N e given in Eq. (27) . N e then becomes the 
quantity (1 /o q) [defined in Eq. (34)] which can easily be calcu- 
lated. It is of interest to note that in certain cases [i.e., 
where [At tr(FE )] 2 »tr(E) • tr(D)](l/<j 0 ) will simplify to 
[tr (D) /tr (E) ] V 2 . 

Using this rough estimate of N e , it is apparent that as the 
"predictability" increases (i.e., as ||D||/||E||-*0) , N e also increases 
and eventually approaches infinity. 


V. - SUBOPTIMAL ESTIMATOR 

As will be shown in the following development, obtaining 
the optimum value of the scalar weighting factor, a, in the sub- 
optimal scheme proposed here requires only the knowledge of the 
scalar factor tr(FP) not the whole covariance matrix, P, as in 
the Kalman filter. Thus, only one scalar quantity is needed in- 
stead of n^ quantities. Moreover, as can be seen from Eq. (23) , 
the tr(FP) need not, in most cases, be evaluated too accurately 
(since it was shown that the parameter, n , could vary a factor 
of 2 without greatly affecting the results) . All of this would 
suggest that it should be possible to effect a significant simpli- 
fication in the calculational procedure (especially for an x of 
a large number of dimensions) in instituting the suboptimal 
scheme rather than the Kalman filter. In what follows, one pro- 
posal will be suggested as to how this simplification can be 
realized. There probably exist other approaches which could 
perhaps even be more advantageous. 

The implementation of the suboptimal estimator being proposed 
here is more or less the same as that of the Kalman filter, except 
that instead of Eq. (26) , the corresponding estimator equation 
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(28) 


is : 


V k I k > - [“k-«.] k A <k 1 k ' 11 + L 1 " “k-J [ H k lz(k) ] 

where 

a k-£ = t 1 " 6 o] 


<*]<-_£ in Eq. (28) is the optimum value of the weighting factor, a. 
It is to be calculated at the instant determined by 

Eq. (22) which in turn requires the knowledge of tr(PE“lp). From 
Eq. (25) it follows that: 


_ i + (2At)tr[p k . £ P(k - t| k- *)J) 1/2 

tr ( E k-!l) 


k-£ 


(29) 


Since it has been assumed in the previous development that 
the basic estimator matrices D, E, and F are only slowly varying, 
staying practically constant in a time interval, N e At, the weight- 
ing factor, need not be calculated more frequently than at 

intervals of N e At apart. Hence, it follows that the same ak-£ be 
used over the whole time interval t^_^ to t^ (where £ = N e At) . 

As indicated before, the main advantage of the suboptimal 
scheme resides in the simplification in the required evaluation 
of the covariance matrix, P. The proposed evaluation procedure 
is now to be presented in detail and its advantages pointed out. 

The evaluation is accomplished by the approximate solution 
of the set of algebraic equations, Eq. (25), and it is to be 
noted that this equation set is valid for any given instant of 
time. Thus, if the matrices, D, E, and F, all correspond to the 
instant, t^-#,; then P = P (k - £ |k-£). 

The method to be used in solving for P in Eq. (25) is an 
iteration procedure where the (i - l)th iteration of P is obtained 
from (i) th iteration by the relation: 


( i+1) (i) 

P = P + AP^ (30) 
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By substituting Eq. (30) into Eq. (25) , it can be shown that 
AP^ is in turn determined by the relation: 



(31) 


where 

(i) (i) (i) T (i) 

AV ED+AFP + P AF -PE P 


If AP^ is defined so as to satisfy the relation: 



+ 



1 

2 


(i) 

A V 


and thus 


AP . 

l 




(32a) 


(32b) 


It can be seen that APj_ is actually a solution to Eq. (31) , but 
it is not symmetric (the matrix AVi however is symmetric) . It is 
to be expected that an approximate symmetric solution to Eq. (31) 
is of the form: 


AP . 

l 


1 

2 


(i p . + 


AP 


r) 


(33) 


Eqs . (30), (32b), and (33) form the working relations for the 

iteration cycle. There is now left only the task of finding the 
initial trial function, P Q . This is to be determined from the 
set of relations: 


(o) 

P = a Q E (34) 
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and 


AP^ = 0 


where 


(At) | tr(FE) 
tr (E) 


tr (E) tr (D) 
(At) 2 tr 2 (FE) 



The scalar factor, o Q , has been chosen so that tr(AV 0 ) = 0, 
assuming that tr(FE) < 0. 


It is of interest to note here that Eq. (25) can be solved 
by exactly the same procedure used to solve the Riccati differ- 
ence equation of the Kalman filter. This would be an iteration 
procedure similar to the one described above except that instead 
of using the relations of Eqs. (32b) and (33), the relation 
APi = AV± would be used. P 0 would have some arbitrary value 
which would usually be far removed from the final assymtotic one. 
As a rough estimate, it can be shown that the number of intera- 
tions that would be necessary for a reasonably accurate estimate 
of P would be of the order of (l/6 0 ) • For a high-"predictability" 
type of estimator, the quantity (l/6 0 ) could very easily have 
values in the range from 10 to 100, and this would also be the 
number of iterations necessary when using standard Kalman pro- 
cedure in solving for P. 


For the proposed suboptimal iteration procedures [repre- 
sented by Eqs. (30) to (33)], it is expected that not more than 
3 or 4 iterations will be required (since, usually, it should 
be sufficient to calculated tr(FP) to within a factor of 2 of 
its true value) . The decrease in the number of required itera- 
tions appears for the following reasons. The first is that P Q of 
Eq. (34) is much closer to the correct P [at least when it is 
used in tr(FP)] than the usual P Q used in the standard Kalman 
procedure. The second reason is that Eq. (32b) is used to calcu- 
late APj_ instead of the relation APj_ = AV. Thus, in each itera- 
tion AP^ advances toward its assymtatic value by a bigger step 
(a factor of the order of II [(Pj. + \ AP^.j) E -1 - Af] “ 1 II larger) that 
it would have in the standard Kalman procedure. 


Although the significant simplification in solving for P in 
the suboptimal scheme would seem to be established theoretically, 
it would be of great value to verify it experimentally. This 
would mean initiating a computer program solving various types of 
simulated estimator problems. The implementation of this program, 
however, must be left for a future investigation due to the large 
effort required. 
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VI. 


CONCLUDING REMARKS 


In summarizing the foregoing developments, it can be seen 
that probably the most important result derived is proof that 
the suboptimal estimator using scalar weighting factors can 
produce an estimate which, in most cases of interest, is nearly 
as good as the optimal Kalman estimate. In view of this result, 
the suboptimal scheme, being easier to understand physically, 
can become useful not only as a simplified estimator but also as 
a means of gaining additional insight into the associated Kalman 
filter. One of the results of this new insight is the possibil- 
ity of an a priori evaluation of the effectiveness of a Kalman 
filter, which was shown to be more meaningful in practice than 
the designation "optimum”. 

Another by-product of the new insight is one that might 
appeal to the more practical physicists and engineers. This is 
the ability of being able to provide a rough qualitative descrip- 
tion in physical terms of how a Kalman filter operates. The 
description is now to be presented in what follows, but first it 
is necessary to give a physical interpretation of the suboptimal 
estimation scheme. This is best done by^studying Eq. ( 6 ) which, 
in essence, describes how the estimate, x^(k | k) , is constructed. 

It has already been established in Section III that y k 4 is 
a scalar weighting factor whose value is given by Eq. (9b) . The 
second factor, [$ (k ,k - i) H k _^z (k - i) ] , on the r.h.s. of Eq. (6) 
can be interpreted as the measurement taken at the time, t k _£ , 
converted to an estimate (by H k ii) and "updated" to the time, t k , 
[by <3>(k,k - i) ] . It now becomes possible to give a qualitative 
explanation of how the estimator functions. It has already been 
shown in Section III that when the "predictability" is low (i.e., 
||Dj c _ ; :||» HE^jJI ) a k _-j_ becomes very small. Thus, the series in 
Eq. (6) converges rapidly with the "effective" number, N e , of 
terms being small. This means that only a few of the contribu- 
tions of the earlier measurements, z (k - i) (where i > 1 ) , are used 
in the estimate. Most of the contributions are being rejected 
because of the unreliability in the "updating" caused by the 
random forcing function u(k-i) in the interval, (t k - tjj . On 
the other hand, if the "predictability" is high (i.e., if 
II D k —ill >> llE k _ i ||) , a k _i can be shown to approach unity. The series 
in Eq. (6) now is slowly converging and the "effective" number 
of terms, N e , is large. This, of course, implies that contribu- 
tions from many of the earlier measurements are being used in 
the estimate. 

Since (as has been ghown in Section III) there is a fairly 
close agreement between x^(k | k) and x(k | k) [or actually between 
tr(P A ) and tr(P)], it can be inferred that the main processes 
taking place in the x (k | k) estimate should take place in the 

II 
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x(k | k) estimate. Thus, as a result of studying the suboptimal 
filter, it does become possible to give the rough qualitative 
physical description mentioned above. One of the main functions 
of a Kalman filter is to essentially "update" the measurement 
data of the previous instants of time to a set of values corres- 
ponding to the present instant. These "updated" measurements 
are used along with the current measurements to form an optimum 
estimate of the state vector. It is to be noted that because of 
the random disturbance term, u(k-i), in the "Canonical" equa- 
tions, there is an uncertainty in "updating" the measurements of 
the previous instants. In fact, the earlier the instant, the 
greater will be the uncertainty. The Kalman filter scheme takes 
account of this fact by essentially assigning weighting factors 
to the contributions of the previous instants so that the weight- 
ing factor becomes smaller as the corresponding instant goes back 
in time. The fact that the Kalman filter uses matrix weighting 
factors so that each component in the state vector can be weighted 
individually probably accounts in part for its superiority in 
performance to the suboptimal scheme proposed here (using only 
scalar weighting factors) . 

As seen in Section IV, the more precise the knowledge of the 
underlying natural processes (i.e., the greater the "predictabil- 
ity") the greater will be the "effective” number, N e , of the 
measurement vectors used to construct the optimum estimate and 
also the more accurate will be that estimate. Thus, it follows 
that the Kalman filter is a better estimator than one using only 
contemporary measurement data only because it has available extra 
information in the form of partial knowledge of the natural pro- 
cesses generating the measurement data (i.e., it has the "Canon- 
ical" equations) , and it uses this information to supply itself 
with extra data derived from the measurements of the previous 
instants. 
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APPENDIX A 


If M and S are square matrices, such that 
it is possible to verify that: 



1, then 


[M+S] 1 = M 1 -M 1 SM _1 + M _1 SM 1 SM 1 + . . . 


(1A) 


Thus, in evaluating the matrix of the test, it is to be 
noted that 


W k “ [\\] [ H k P k H £ + R k]' 1 [ H k] 


where 


P k = P (k k - 1) 


( 2A) 


If || Rk (H k P k Hj£) 1 || < 1 using the approximation developed in 
Eq. (1A) , W]£ becomes 


W, 


£ [ H k lR k][ H K P K H K]‘ 1 [ I - R k (H P H k>' 1 ] H l 


(3A) 


where only the first two terms in Eq . (1A) have been included. 

With the use of matrix algebra, Eq. (3A) can be brought to 
the form: 


W, 




H , 1 R 1 H. 
k k k 


^K 1 ] 


( 4A) 


Using the first term of the approximation of Eq. (4A) in Eqs . 
(3a) and (3b) of the text, the result can be: 


P k+1 ®(k+l,k) H k R k H k (k+ l,k) + r k+1>k Q k r k+1/k 


(5A) 
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In view of the initial assumption, the first term in the 
r.h.s. of Eq. (5A) is small compared to the second. Thus, P^ 
can be approximately given by P* which is defined as: 


P S = H k lR k H k‘ 1 +r k,K-A-i r k,k-i 


Upon substituting P£ from Eq. (6A) in Eq. (4A), the result will 
be Eq. (10) of the text. 
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APPENDIX B 


Assuming that M and S are positive definite symmetric square 
matrices, it follows from matrix theory that*: 


M = UA U T 
m 


and 


= TA T 
s 


(IB) 


where U and T are unitary matrices and A m and A s are diagonal 
matrices. It also follows from matrix theory that*: 


tr (MS) = tr (M*A ) 

s 


(2B) 


where 


M* = T T M T = (T T U)A m (U T T) 


T 

It is to be noted that the product (T U) is also a unitary matrix* 
and, thus, M* is also a positive definite symmetric matrix. 

Eq. ( 2B) can be rewritten as: 


tr (MS) = J^m . . a . 

• _L _L 1 
i 


( 3B) 


where m£j_ are the diagonal elements of M* and are the diagonal 
elements of A s . Since a^>0, it follows that: 



> tr (MS) > (m) m 2a i 
i 


(4B) 


*See "Introduction to Matrix Analysis" by R. Bellman, pp. 38 and 
95, McGraw-Hill Book Co., Inc., 1960. 
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where (m) M is the maximum of the m-j^ elements and (m) m is the 
minimum. Moreover, it follows from Eq. (IB) that*: 


m. . 

11 



(5B) 


where 



1 


T 

The aj_j are the elements of (T U) and Uj are the diagonal elements 
of A m . Hence, it is readily seen that: 



> m. . > (y ) 
11 


m 


where (y) M is the maximum of the vu elements and (y) m is the 
minimum. 

Using Eqs . (4B) and (6B) and identifying M with (I - B) and 

S with AP, it can easily be seen that Eq. (24) of the text will 
result . 


*See "Introduction to Matrix Analysis" by R. Bellman, pp. 38 and 
95, McGraw-Hill Book Co., Inc., 1960. 
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APPENDIX C 


It is to be assumed here that systems to be considered are 
only those whose associated Kalman filters are completely con- 
trollable and completely observable. If this is true, it becomes 
apparent by considering the definition of observability* (at 
least in the case where the matrices of the Canonical equations 
are slowly varying) that the relations developed in the following 
section can be valid. 

First, a new measurement vector z* (k) must be defined so 
that it has a dimensionality of n instead of the m dimensionality 
of z (k) (where m < n) . 

Thus : 


z* (k) 


z (k) 

$(k,k -l)z(k - 1) 


L$(k,k - s + l)z(k - s + 1)J 


(1C) 


__ The integer, s, is chosen in Eq. (1C) so that n/m £ s < n/m + 1. 
z (k - s + 1) is a vector using only the first p components of 
z (k - s + 1) where p = n - (s - l)m. It would follow that it is 
possible to construct a new n*m matrix h£ which has an inverse 
and is defined by: 


z* (k) = H*x(k) + v* (k) 


(2C) 


where 


v* (k) 


v(k) 

$ (k,k - 1) v(k - 1) 


L$(k,k - s + l)v(k - s +1)J 


*See "Optimal Estimation Identification and Control" by Robert 
C.K. Lee, pp. 82-83, M.I.T. Press, Cambridge, Mass., 1964. 
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In terms of the nxm, matrix H k , the new matrix H k would 
have the explicit form: 


H £ = 


H, 


$ (k,k - 1) H 


k-1 


(3C) 


$(k,k- s +l)H k _ s+1 


where H k _ s+ ;L is the reduced pxn matrix formed by using the first 
p rows of H k _ s+1 . 

By using z* (k - i) , Hj*_^ and Rj£_^ instead of z(k-i, H k _j 
and Rfc-i) in the estimator systems, it is seen that the treatment 
developed in the text can be extended to the general case. How- 
ever, it is apparent that by using this scheme estimates cannot 
be made at the end of every time t k _-j_ (where i = 0,1,2,...), but 
the intervals between estimates would be sAt apart (i.e., esti- 
mates would occur at the times t k _^ where i = 0 , s , 2s , 3s , . . . ) . 
Moreover, since some of the z (k - i) data is thrown away [to form 
z (k - s + 1) ] the accuracy of the estimate would be reduced. 
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